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0 .  INTRODUCTION 


Canonical  correlation  analysis  plays  an  important  role  in  applied 
research  since  it  provides  a  method  of  creating  meaningfully  bivariate 
random  variables  from  two  groups  of  random  variables,  each  consisting 
possibly  of  a  large  number  of  components.  The  main  purpose  of  this  study 
is  to  investigate  approaches  to  deal  with  the  problems  of  construction 
and  statistical  inference  which  are  posed  when  one  tries  to  extend  the  theory 
of  canonical  correlation  to  deal  with  more  than  two  sets  of  variables. 

Anderson  (1958)  had  suggested  an  interesting  constructional  approach 
for  extending  the  theory  of  canonical  variables.  Though  Steel  (1951)  and 
Kettenring  (1971)  have  attempted  to  construct  generalized  canonical 
variables  based  on  Anderson's  approach,  their  results  seem  formidable  for 
practical  use — as  have  been  speculated  by  Anderson.  This  paper  presents 
a  simple  solution  to  Anderson's  problem  after  modifying  it  with  a  useful 
constraint.  Also  problems  of  statistical  inference  related  to  generalized 
canonical  variables  are  formulated  and  solutions  to  some  of  them  are  pre¬ 
sented  in  this  paper.  As  an  application,  important  practical  problems 
similar  to  the  one  posed  by  Gnanadesikan  (1977),  p.  77  can  be  formulated  as 
a  hypothesis  testing  problem  and  appropriate  tests  can  be  performed. 

From  a  review  of  the  existing  literature  on  generalizations  of 
canonical  correlations,  it  was  found  that  all  the  approaches  to  construct 
generalized  canonical  variables,  except  that  due  to  Vinograde  (1950)  have 
the  two  natural  properties  -  (1)  they  all  reduce  to  the  classical  method 
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when  the  number  of  groups  is  only  two  and  (2)  the  criterion  for  selection 
of  generalized  canonical  variables  optimizes  some  function  of  their 
correlation  matrix.  No  work  seems  to  have  been  done  on  the  problems 
of  statistical  inference  associated  with  the  generalization  of  canonical 
variables . 

Section  2  introduces  a  new  approach  for  the  construction  of 
generalized  canonical  variables.  The  constraint  of  equi-correlation  is 
discussed  in  the  light  of  a  variety  of  results  and  applications  while 
the  criterion  of  minimum  generalized  variance  is  shown  to  be  of  special 
interest  through  its  statistical  interpretation  and  examples  of 
applications  in  a  wide  variety  of  fields.  A  brief  formulation  of  the 
problems  of  statistical  inference  to  be  investigated  in  this  research  is 
also  presented  there. 

Zn  case  of  known  population  parameters,  a  derivation  of  the  new 
generalized  canonical  variables  is  given  in  Section  3.  Statistical 
estimation  of  these  variables,  when  population  parameters  are  unknown  is 
given  next  and  the  interesting  theory  for  the  singular  case  concludes 
this  section. 

For  the  new  generalized  canonical  variables,  tests  of  hypotheses 
concerning  their  equi-correlation  coefficient  and  generalized  variance 
are  of  special  interest.  Using  a  large-sample  approximation  for  the 
distribution  of  the  new  variables  in  case  of  normal  population,  various 
tests  for  the  above  hypotheses  are  proposed  in  Section  4.  Some  Properties 
of  the  above  tests  and  their  relationship  with  the  Likelihood  Ratio  Test 
for  correlation  coefficient  is  also  investigated  in  this  section. 

Tests  for  deciding  whether  the  new  generalized  canonical  variable 
of  a  particular  size  is  as  good  as  the  one  with  a  larger  number  of 


elements  are  given  in  Section  5.  Also,  in  this  section  tests  for 
ooaf>aring  several  new  generalised  canonical  variables  in  some  special 
cases  are  given  using  isotonic  regression  techniques. 

Finally,  some  discussions  and  topics  for  future  research  are  pre 


s anted  in  Section  7. 


1.  DEVELOPMENT  OF  GENERALIZED  CANONICAL  VARIABLES 


1.1  Genesis : 

A  random  vector  of  interest,  in  many  situations,  poses  both 

analytical  and  economical  problems  if  it  has  too  many  components .  The 

objective  creation  of  a  new  random  vector  with  a  much  smaller  number  of 

components  from  the  original  one  with  a  large  number  of  components  can 

often  be  of  great  help  in  applied  research.  The  first  principal  component 

in  effect  seeks  to  determine  a  linear  combination  of  the  components  of  a 

p-dinensionai  random  vector  which  explains  the  maximum  variance  among  a 

set  of  all  possible  standardized  linear  functions.  This  procedure  has 

been  extended  by  Gnanadesikan  and  Wilk  (1969)  to  search  for  a  non-linear 

combination ,  giving  rise  to  non-linear  first  principal  component.  So,  a 

p-dinensional  vector  can  be  represented  by  just  a  scalar  variable.  Very 

often,  the  components  of  the  original  vector  can  be  meaningfully  grouped 

into  several  disjoint  subvectors.  In  the  case  of  two  groups.  Hotelling 

(1936)  proposed  canonical  variables,  where  the  first  canonical  variable 
seeks  a  two-component  vector,  each  element  being  a  standardized  linear 

combination  of  the  corresponding  group,  such  that  the  correlation  between 

any  two  such  standardized  linear  combinations  is  maximum.  The  main 

advantage  of  principal  component  and  canonical  correlation  analysis  is 

that  they  reduce  the  problem  of  dimensionality  and  further,  meaningfully 

try  to  explain  the  variation  in  the  set  to  a  great  extent.  The  method  of 


canonical  correlation  has  been  widely  applied  in  economics,  marketing 
and  in  most  of  the  behavorial  and  social  sciences. 

Hotelling  noted  the  need  for  the  extension  of  his  canonical 
variables  to  the  case  of  several  groups.  The  object  here  is  the  same  as 
above.  However,  the  criterion  for  the  choice  of  the  variables  need  to 
be  carefully  defined. 

For  the  case  of  k  groups,  it  is  desirable  that  the  criterion 
for  the  choice  of  the  variates,  should  be  some  function  of  their  corre¬ 
lation  matrix.  Various  such  criteria  and  hence  generalizations  of 
canonical  variables  have  been  proposed. 

The  case  of  a  single  variable  per  set  has  been  studied  by  many 
authors,  though  from  different  viewpoints,  e.g.  Horst  (1936),  and 
McKeon  (1966). 

Kettenring  (1971)  has  considered  the  algebraic  derivation  of  the 
generalized  canonical  variables  for  more  than  two  sets  with  several 

variables  per  set.  An  expository  discussion  of  generalizations  of  canonical 
variables  is  given  in  Sen  Gupta  (1981c) . 


2.  INTRODUCTION  TO  NEW  GENERALIZED  CANONICAL  VARIABLES 


2.1  Definitions; 

In  the  previous  section  it  was  seen  that  various  attempts  to 
generalize  canonical  variables  have  been  made.  Though,  no  formal  defini¬ 
tion  of  GCV  seems  to  exist,  in  the  spirit  of  the  above  attempts  the 
following  definition  can  be  given. 

Definition  1;  Suppose  a  random  vector  X  can  be  partitioned  meaning¬ 
fully  into  k  disjoint  groups  or  sub-vectors,  X^..,X^The  first  GCV  Y  ^ 
with  Y]J1)]-[f11(X<1) >,...# flk(£(k))]»  where  f^s  are  real- 

valued  functions,  is  a  k-dimensional  random  variable,  the  components  of 
which  are  chosen  so  as  to  optimize  a  criterion  based  on  some  function  of 
their  correlation  matrix.  The  generalized  higher  order  canonical  variables 
Y^  ,Y*3^ , .  •  • .  are  also  k-dimensional  random  variables,  the  components  of 
which  are  chosen  so  as  to  optimize  the  same  criterion  with  some  additional 
constraints  imposed  at  each  stage  regarding  the  relationships  among  these 
variables. 

Note,  for  all  known  methods  f ^s  are  linear  functions.  However, 
following  the  lines  of  non-linear  principal  component  analysis,  it  is 
expected  that  non-linear  GCVs  can  also  be  developed  where  the  situation 
warrants  the  necessity  of  such  a  one. 

In  the  present  work,  the  search  is  for  linear  combinations 
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of  the  elements  of  each  group,  such  that  they  are  standardized  and  are 
equi-correlated.  Formally,  for  the  method  to  be  presented  here,  we  have 
Definition  2:  The  first  new  GCV  is  the  vector  Y  ]  - 

la[V  , . . .  ’  X(k)  ] , where  o^13  1 ,  i=l,...,k  are  chosen  such  that  the 

Y^  ,  i*l,...k  are  equi-correlated  and  the  generalized  variance  of 


(s) 


is  minimum.  The  s-th  order  new  GCV,  YVB',  s=2,3,...  has  components  Y^s3, 

i»l , . . . ,kj  y!s3 -  ajs)  X.  where  cjs)s  are  chosen  so  that  Y?s*  are  equi- 

correlated, Y^s3  is  uncorrelated  with  Y^r3  ,  r<s  and  generalized  variance 
_  „(s)  . 

of  \  is  minimum. 


2.2  The  constraint  of  equi-  correlation  and  the  criterion  of  minimum 
generalized  variance; 

The  motivations  leading  to  seek  such  new  GCVs  are  presented  next, 
(s) 

Note  that,  though  Y  has  equi-correlated  components,  the  original 
vector  variable  X  may  have  any  arbitrary  dispersion  matrix. 

(A)  The  equi-correlation  model  -  the  constraint:  The  equi-correlation 
model  not  only  has  wide  applications  and  interesting  properties  but  as  we 
will  see,  leads  to  simplifications  for  the  analysis  of  new  GCVs. 

(A. a)  Examples  and  uses;  The  equi-correlation  model  has  been  greatly  used 
in  various  fields,  e.g.  by  Votaw  (1950)  in  medical  experiments.  Seal  (1964) 
in  the  study  of  basic  patterns  of  growth  of  grasshoppers,  Gupta  and 
Panchapake s an  (1969)  in  ranking  and  selection  procedures, etc.  Various 
problems  of  statistical  inference  have  been  researched  using  this  model, 
e.g.  in  the  same  year  (1968)  in  Biometrika  by  Geisser  and  Desu,  Glesser 
and  also  Han.  Because  of  its  numerous  applications,  extensive  tables  for 
equi-correlated  multivariate  normal  c.d.f.  were  published  by  Milton  (1963). 


(A.b)  Properties:  The  equi-correlation  structure  possesses  some 
interesting  properties.  Two  of  them  are  discussed  below. 

(i)  In  many  tests  of  Multi variate  Analysis,  e.g.  tests  in  MANOVA,  Profile 
Analysis,  Growth  Curve  Analysis  etc.  if  sample  size  from  each  population 
is  not  at  least  p,  then  in  order  that  the  mean  square  ratio  in  these 
Repeated  Measurements  Designs  have  exact  F-distributions,  the  equi- 
correlation  condition  was  thought  to  be  necessary.  However,  Huynh  and 
Feldt  (1970)  proved  that  this  is  a  sufficient  condition. 

(ii)  Another  property  follows  from  the  following 

k-1 

Lemma:  For  any  correlation  matrix  R,  R=(r  ) ,  |R|<[l+(k-l)r3Cl-r3 
with  equality  iff  r..  *  E  r.  /  k(k-l)  =  r,  for  all  i,j  and  where  R  is  a 

i/j  *3 

k  x  k  matrix. 

Proof:  See  Aitken,  Nelson  and  Reinfurt  (1968)  . 

Note  that  using  the  constraint  that  the  new  GCVs  are  equi- 
correlated,  we  not  only  have  great  advantages  from  practical  considerati¬ 
ons,  such  as  wide  applications  in  a  variety  of  fields  and  availability  of 
tables  for  the  probability  integrals  in  case  of  multivariate  normal 
population,  but  also  from  theoretical  point  of  view,  since  now  many  tests 
in  Multivariate  Analysis  can  be  performed  on  these  new  GCVs  because  they 
satisfy  a  sufficient  condition  for  the  validity  of  such  test  criteria. 
Further,  we  have  the  important  advantage  of  working  with  a  vector  having 
a  much  smaller  number  of  components  as  compared  to  the  original  one.  Also, 
if  the  criterion  is  minimization  of  generalized  variance,  in  view  of 
property  (ii)  above,  the  method  seems  to  utilize  the  minimax  principle. 

[gj  The  generalized  variance  — __the_criterion*.  The  importance  of  generali¬ 


sed  variance  in  Multivariate  Analysis  is  well  known.  We  review 
interesting. features  about  it  below. 
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(B.a)  The  generalized  variance  as  a  measure  of  multi-dimensional  scatter: 

The  generalized  variance  of  a  random  vector  variable  is  the 
determinant  of  the  dispersion  matrix  of  that  variable. For  the  correlation 
matrix  R,  |r|  was  termed  ’Scatter  coefficient*  by  Frisch  in  1929.  This 
definition  of  the  generalized  variance  is  a  natural  extension  of  the 
variance  in  the  univariate  case  as  a  measure  of  dispersion. 

Here  we  will  be  concerned  with  only  one  type  of  measures  of 
dispersion,  namely,  the  scatter  of  a  random  variable  from  a  point  of 
reference  (and  will  thus  disregard  generalizations  of  other  types  like 
range  etc.).  Some  of  the  following  discussions  are  found  in  Mathai(1967). 

If  X*  is  a  random  variable,  m  is  a  fixed  point  of  reference  and 
X=X'-m,  then  a  measure  of  dispersion  for  X'  from  m  can  be  defined  by  the 
following  metric  D  satisfying  the  following  axioms: 

Kl.  D (X) >  0,  D(X)=  0  iff  X=  0  almost  surely. 

K2.  D(aX) -  1  a |  D(X),  where  a  is  a  scalar  quantity. 

K3.  D(X+Y)<  D(x) +D(Y)  where  Y'is  another  random  variable  and  Y=Y'-m. 

K4.  D(X)=  1  if  | X (  =  1  almost  surely. 

A  D (X)  satisfying  Kl  to  K4  can  be  called  a  measure  of  dispersion  in  X. 

From  statistical  considerations  two  more  desirable  properties  of  D(X)are 
r 

K5 .  D(aX+b)  *  | a |  D(X) ,  where  a,b  and  r  (r > 0)  are  scalars. 

K6.  D(X+Z)  =  D(X)+D(Z) ,  where  Z  is  another  random  variable  independent 
of  X. 

If  E  denotes  the  operator  mathematical  expectation  then  an  example  of  a 


y  1 

measure  of  dispersion  is  given  by  D(X)-[e|x|  ]  for  fixed  r>l. 

i  i2  2 

E|X|  -  a  ,  satisfies  K5  and  K6,  where  E(X’)-  m. 


If  (X’,  Y')  is  a  bivariate  random  variable,  (m  ,  m  )  is  a  point 
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of  reference  and  (X,Y)=  X'-m2)  then  the  joint  dispersion  D(X,Y) 

in  (X,Y)  can  be  defined  by  the  following  axioms: 

LI.  D(X,Y)=D(Y,X) . 

L2.  D(aX,Y)  =  a  D(X,Y)  where  a  is  a  scalar  quantity. 

L3.  D(X+Z,Y)=  D(X,Y)+D(Z,Y)  where  Z^Z'-m^  and  Z*  is  another  random 
variable  having  a  joint  distribution  with  (X,Y). 

L4.  D(X,X)=  D^(X)  where  D{X)  is  an  univariate  measure  of  dispersion. 

A  desirable  property  here  would  be 

L5.  d[a(X,Y)+b]=  J A i  D(X,Y) ,  where  r(r>0)  is  a  scalar,  A,B  are  matrices 
with  scalar  elements  and  A(X,Y)+B  is  a  non-singular  transformation  of  (X,Y). 
The  concept  of  covariance  provides  an  example  of  a  measure  of  joint 
dispersion. 

Finally,  consider  a  random  vector  X.  with  k  components.  Let  d^  denote  a 
measure  of  joint  dispersion  in  the  i^  and  j^  variables.  So  the  matrix 
d2”  ^ij)  can  be  taken  as  a  multivariate  measure  of  joint  dispersion  in 
the  k  variables.  Note  that  Z  provides  an  example  of  such  a  measure.  A 
measure  of  generalized  dispersion  or  multi-dimensional  scatter  would  be 
a  scalar  quantity  arising  as  a  multivariate  analogue  of  a  univariate 
measure  of  dispersion.  Hence  such  a  measure  can  be  defined  as  any  norm 
of  the  matrix, i.e.  should  satisfy 
Ml.  ||o|i  >  0,||d  | (■  0  iff  D_is  a  null  matrix. 

M2.  | |ao{ |*  |a|  | | D | j  where  a  is  a  scalar  quantity. 

M3.  | | C+D ( |  <  1 1 C | |  +  | | O | j  where  C  is  another  matrix  defined  similarly. 

M4.  ||cd||  <  Hell  .  1|d|| 

A desirable  property  here  would  be 

M5.  |  |AD(X)  +b|-{=  1 A  J r  |  |  D  |  j  where  r  >  0  is  a  scalar  and  A  is  non-singular. 
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Hence  taking  E,  the  usual  variance-covariance  matrix,  for,D  any  norm  of 
E  can  be  taken  as  a  measure  of  generalized  dispersion. Largest  characteri¬ 
stic  root  of  E  is  an  example  of  such  a  measure. 

Note  first  that  it  is  reasonable  to  expect  that  any  scalar 

function  of  E,  used  as  a  measure  of  multi-dimensional  statistical  scatter 

should  take  into  consideration  the  magnitude  of  the  correlations  among 

the  variables.  Secondly,  van  der  Vaart  (1965)  has  shown  that  the  expected 

volume  of  the  simplex  formed  by  the  k+1  random  points  in  k  dimensions  or 

k  random  points  and  the  mean  vector  is  equal  to  the  generalized  variance, 

|E|.  This  is  a  natural  generalization  of  the  fact  that  the  expected 

distance  between  two  points  or  one  point  and  the  mean  is  the  variance  in 

one  dimension.  Thirdly,  if  the  probability  that  a  random  point  will  lie 

in  an  ellipsoid  of  a  k-dimensional  distribution  per  unit  volume  is  large, 

then  the  population  is  well  concentrated  about  the  mean.  Finally,  for  a 

multivariate  normal  population,  MLE  of  |e|  is  the  sample  generalized 

variance  as  in  the  univariate  case  and  distributional  properties  of 

the  sample  generalized  variance  has  a  marked  similarity  with  that  of  the 

2 

sample  variance, e.g.  for  a  random  sample  of  size  n,  ns  is  distributed 

in  the  multivariate 
2k  2k 

case  with  a  vector  variable  of  k  components,  where  s  and  a  are  the 
sample  and  population  generalized  variances  respectively. 

From  the  above  properties  and  the  works  of  Mathai  (1967) 
entitled  'Dispersion  theory*  and  Wilks  (1967)  entitled  'Multidimensional 
statistical  scatter'  the  term  generalized  variance  for  the  determinant 
of  the  dispersion  matrix  and  its  use  as  a  measure  of  generalized 
dispersion  seems  to  be  very  well  coined  (though  it  is  a  semi-norm) . 


2  2 

as  °  Vi 


k  2k  2k  2 

in  the  univariate  case  and  ns  as  o  ^ 
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(B.b)  Applications  of  the  generalized  variance:  Some  situations  are  cited 


where  the  generalized  variance  can  be  used.  Next,  some  examples  where  the 
use  of  the  generalized  variance  has  been  advocated  are  presented. 

Mathai  (1967)  pointed  out  that  various  statistical  problems  of 
estimation  and  tests  of  hypotheses  can  be  looked  upon  as  the  study  of 
properly  defined  measures  of  dispersion.  Using  the  work  of  van  der  Vaart 


(1965),  let  Q2  *  [B  j2  /(k+l)l. 


where  X.  is  k  x  1, 
— 1 

i  =  1, . . . ,k+l. 


If  X  -v  F,  then  Qx  'v*  H(F)  ,  with  mean  as  the  generalized  variance  of  X. 

This  is  a  very  important  and  useful  observation,  for  testing  purposes. 

Hence,  any  test  for  the  equality  of  two  location  parameters  can  be  used 

to  provide  a  test  for  equal  generalized  variances.  So,  standard  non- 
parametric  tests  for  location  can  be  used  when  the  underlying  distribution 
is  unknown.  Steyn  (1978)  has  pointed  out  that  testing  the  null  hypothesis 
that  the  population  mean  vector  y^of  the  multivariate  normal  distribution 
with  mean  y^and  dispersion  matrix  I  remains  constant  during  the  sampling 
process  against  the  alternative  that  the  mean  vector  varies  during  the 
process,  is  equivalent  to  testing  the  null  hypothesis  that  the  generalized 

variance  -  Jl|  against  the  alternative  hypothesis  that  the  generalized 

*  *  n  -1 
variance  *  |l  j  where  Z  «  (I+2D/n)E,  D  ■  Z  u  u'Z  ,  n  is  the  sample  size 

r*lr~*K 

and  X  'v.  N(u  ,  Z) ,  r  ■  l,...,n. 

Several  practical  examples  are  given  below. 

Example  1.  (Press,  1972).  A  transistor  produced  by  company  j,  is 
characterised  by  a  vector  of  k  measurements,  :  k  x  1.  Let  X^  N(y_,  Z^), 
j  *  l,...,r.  A  purchaser  would  want  to  select  the  company  with  the  property 
that  he  minimises  his  chance  of  receiving  a  product  that  is  sometimes  of 
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Assuming  a  multivariate  normal  distribution, Gnanadesikan  and  Gupta  (1970) 
and  earlier  Eaton  (1967)  have  considered  the  problem  of  ranking  the  r 
underlying  populations  according  to  the  magnitude  of  their  generalized 
variances.  Assuming  that  the  loss  function  associated  with  the  given 

2k  y  |r 

ranking  problem  depends  only  on  the  r  generalized  variances  (o^  ), 

Eaton  has  shown  that  the  decision  rule  which  ranks  the  populations 

2k  2k 

according  to  the  sample  generalized  variances  (s^  ,  ...,sr  )  possesses  the 

following  properties.  It  is  (i)  minimax  within  the  class  of  all  decision 

rules  (ii)  admissible  within  the  class  of  decision  rules  which  depend 
2k  2k 

only  on  (s^  ,  ...,s  )  and  (iii)  is  the  uniformly  best  decision  rule  among 

the  class  of  rules  which  depend  only  on  (s^,...,s^)  and  are  invariant 

2k  2k 

under  permutations  of  (s,  _ _ ,s  ). 

i  r 

Example  2.  Though  the  definition  of  optimality  becomes  a  problem  in  the 
multivariate  situation,  various  authors  have  suggested  the  use  of 
generalized  variance  in  determining  the  optimum  allocation  of  sampling 
units.  Ghosh  (1958)  and  Arvanitis  and  Afonja  (1971)  have  proposed  to 
determine  the  optimum  sample  size  for  the  j-th  strata,  in  stratified 
random  sampling  by  minimizing  the  generalized  variance  of  the  sample  means. 
Example  3.  Goodman  (1966)  has  advocated  the  use  of  of  the  generalized 
variance  to  compare  the  overall  variability  of  the  populations  of  maize. 

It  is  apparent  that  a  system  of  classification  based  on  the  overall 
similarities  among  the  races  is  necessary  both  for  the  optimum  usage  of 
the  race  collections  in  breeding  programs  and  for  the  study  of  the 
evolution  of  maize.  Knowledge  of  the  origin  of  the  races  against  which 
values  of  the  generalized  variance  can  be  compared ,  motivates  here  the 
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generalized  variance  as  a  reasonable  measure  of  the  overall  variability 
and  its  use  as  a  method  of  classifying  the  races  of  the  maize. 

(C)  Minimization  of  the  generalized  variance  -  the  choice:  New  GCVs  are 
selected  such  that  their  generalized  variance  is  minimum  at  each  stage. 
The  choice  of  minimization  stems  from  two  important  considerations. 
Firstly,  in  the  case  of  2  groups,  it  is  clear  that  maximizing  the  corre¬ 
lation  of  the  2  linear  functions  corresponding  to  the  2  groups,  is 
equivalent  to  minimizing  the  generalized  variance  of  thes  two  linear 
combinations.  This  property  is  extended  to  the  case  of  k  groups. 
Secondly,  as  has  been  illustrated  in  the  various  examples  in  sub-section 
(B)  above,  minimization  of  the  generalized  variance  is  widely  used  in 
many  contexts  and  is  of  great  practical  use  in  various  situations. 

2.3  Formulation  of  the  problems; 

Broadly  speaking,  the  two  distinct  problems  here  are  those  of 
construction  and  statistical  inference  related  to  GCVs.  The  present  work 
deals  with  a  random  vector  which  can  be  meaningfully  grouped  into  k  dis¬ 
joint  sub- vectors.  For  the  problem  of  construction,  in  its  complete 
generality,  one  would  face  at  least  three  problems  of  optimization  - 
(i)  How  to  select  the  groups  in  the  absence  of  a  given  grouping  (ii)  How 
to  select  the  components  of  the  GCVs  and  (iii)  How  to  decide  on  the 
optimal  stage  of  stopping  for  higher  order  GCVs.  (i)  can  possibly  be 
solved  partly  by  Cluster  Analysis.  For  (ii)  some  available  methods  have 
been  reviewed  above .  in  the  case  of  more  than  2  groups ,  for  (iii) ,  as 
Rettenring  has  remarked  for  some  of  the  methods  above  the  situation  is 


'somewhat  arbitrary' 
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No  work  seems  to  be  available  for  the  problems  of  statistical 
inference  related  to  the  GCVs.  Various  problems  would  be  interesting 
here.  The  distribution  of  GCVs  needs  to  be  derived.  In  case  the  form  of 
the  exact  distribution  is  of  not  much  use,  suitable  approximations  and 
large  sample  distribution  will  be  worth  investigating.  Several  hypotheses 
develop  here  naturally  from  the  defination  of  GCV  and  statistical  tests 
are  needed  for  these.  Some  such  hypotheses  are  discussed  next,  (i)  It 
would  be  natural  to  find  out  the  extent  of  multidimensional  statistical 
scatter  explained  by  considering  k  groups.  So,  a  test  of  hypothesis  for  a 
specific  value  of  the  generalized  variance  of  the  new  GCV  would  be  needed. 
Since  one  would  want  the  new  GCVs  to  have  small  scatter,  the  alternative 
hypothesis  naturally  should  be  of  the  less  than  type.  So,  we  would  test 
V  [Ey  i*  °Q  a9ainst  H^:  |Ey|  <  °q  *  Cii)  Next,  it  would  be  interesting 
to  see,  if  for  the  same  dimension,  regrouping  of  the  variables  in  the 
original  vector  produces  better  result.  For  the  new  GCVs  the  related 
tests  of  hypotheses  are  -  H^:  |lyl|  =|^y2l  against  Hns  I  syll<UY2l  * 

Hq2!  I  ^yj  I s  416  ®^ua*  against  Hj_2!I^Y-jls  are  in  a  9iven  order. 

(iii)  Also,  it  is  reasonable  to  explore  the  possibility  of  whether  the 
consideration  of  k  groups  leads  to  any  substantial  gain  as  compared  to 
less  number  of  groups  which  may  be  explicitly  stated  a-priori  or  which 
may  be  constructed  by  reasonably  regrouping  the  variables  in  a  fewer 
number  of  groups  or  amalgamating  some  of  the  original  groups.  The 
advantage  here  would  be,  that,  in  terms  of  the  criterion,  a  GCV  with  a 
fewer  number  of  components  may  perform  as  good  as,  if  not  better,  than 
the  original  one  with  a  larger  number  of  components .  This  consideration 
may  be,  as  in  (ii)  ,  extended  to  the  case  of  more  than  two  GCVs  with  the 
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possibility  of  some  of  them  to  have  same  number  of  components.  Since  the 
criterion  here  must  be  comparable ,  the  related  tests  of  hypotheses  become 
H01!  lEyl|1Al  "  I  1 1 A2  against  Hns  lEY1|1Al<  lEY2|1/K2  and 

H02!  lEyj|1/k3  are  all  equal  against  H^:  |Eyi|1/ki  <  |Eyj|1/kj,  k^  k.. 

and  a  given  ordering  of  generalized  variances  for  new  GCVs  with 

equal  number  of  components . 

Note  that  in  (ii)  Yj  referred  to  the  new  GCV  obtained  by  the  j-th  mode  of 
grouping  the  original  random  vector,  j  =  1,2,...  while  in  (iii)  Yj 
referred  to  the  new  GCV  with  number  of  components  k_.  ,  corresponding  to 
the  j-th  mode  of  grouping,  j  =  1,2,...  An  example  where  such  a  situation 
may  arise  is  provided  by  the  following  example  [which  corresponds  to 
in  (ii)]. 

Example ; (Thurstone  and  Thurstone,  1941).  Horst,  Kettenring  and  Seal  (1964) 
discussed  this  example.  Three  (=k)  sets  of  scores  by  several  people  on 
three  batteries  of  three  tests  each  (group  size  =  3  for  each  group)  were 
obtained.  The  three  tests  in  each  battery  were  intended  to  measure, 
respectively,  the  verbal,  the  numerical  and  the  spatial  abilities  of  the 
persons  tested.  The  correlation  matrix  of  the  original  scores  was  given. 
Gnanadesikan  (1977)  commented  "An  interesting  alternative  analysis  in 
this  example  would  be  to  regroup  the  nine  variables  into  three  sets 
corresponding  to  the  three  abilities  measured  rather  than  the  three 
batteries  of  tests."  How  profitable  this  alternative  grouping  would  be 
in  terms  of  multidimensional  scatter  can  be  judged  by  the  test  for  H^ 
in  (ii)  above.  Note  that,  since  the  Yj  s  would  be  correlated,  such  tests 
may  not  be  very  simple,  even  if  the  GCVs  have  a  simple  distribution. 


No  work  seems  to  be  available  for  the  problems  of  statistical 
inference  related  to  the  GCVs.  Various  problems  would  be  interesting 
here.  The  distribution  of  GCVs  needs  to  be  derived.  Zn  case  the  form  of 
the  exact  distribution  is  of  not  much  use,  suitable  approximations  and 
large  sample  distribution  will  be  worth  investigating.  Several  hypotheses 
develop  here  naturally  from  the  defination  of  GCV  and  statistical  tests 
are  needed  for  these.  Some  such  hypotheses  are  discussed  next,  (i)  It 
would  be  natural  to  find  out  the  extent  of  multidimensional  statistical 
scatter  explained  by  considering  k  groups.  So,  a  test  of  hypothesis  for  a 
specific  value  of  the  generalized  variance  of  the  new  GCV  would  be  needed. 
Since  one  would  want  the  new  GCVs  to  have  small  scatter,  the  alternative 
hypothesis  naturally  should  be  of  the  less  than  type.  So,  we  would  test 
V  Uy  !■  Op  against  H  s  |lyj  <  1  (ii)  Next,  it  would  be  interesting 

to  see,  if  for  the  same  dimension,  regrouping  of  the  variables  in  the 
original  vector  produces  better  result.  For  the  new  GCVs  the  related 
tests  of  hypotheses  are  -  HQ1:  |Zylf  -|Zy2|  against  Hns  lEyll<l2Y2l J 
H02!  1 2y^j  1 3  are  equal  against  Hi2!^£)ls  are  *n  a  9*ven  order. 

(iii)  Also,  it  is  reasonable  to  explore  the  possibility  of  whether  the 
consideration  of  k  groups  leads  to  any  substantial  gain  as  compared  to 
less  number  of  groups  which  may  be  explicitly  stated  a-priori  or  which 
may  be  constructed  by  reasonably  regrouping  the  variables  in  a  fewer 
number  of  groups  or  amalgamating  some  of  the  original  groups.  The 
advantage  here  would  be,  that,  in  terms  of  the  criterion,  a  GCV  with  a 
fewer  number  of  components  nay  perform  as  good  as,  if  not  better,  than 
the  original  one  with  a  larger  number  of  components.  This  consideration 
My  be,  as  in  (ii)  ,  extended  to  the  case  of  more  than  two  GCVs  with  the 


16 


possibility  of  some  of  them  to  have  same  number  of  components.  Since  the 
criterion  here  must  be  compar able ,  the  related  tests  of  hypotheses  become 
H01*  lryil1/kl  “  l^l1^2  gainst  lEyl|1/kl<  lEy2I1/k2  and 

H02!  iryj|1/k3  are  a11  equal  against  H12:  |Eyii1Ai  <  l^l^j.  *L<  k.. 

and  a  given  ordering  of  generalized  variances  for  new  GCVs  with 

equal  number  of  components . 

Note  that  in  (ii)  Yj  referred  to  the  new  GCV  obtained  by  the  j-th  mode  of 
grouping  the  original  random  vector,  j  =  1,2,...  while  in  (iii)  Yj 
referred  to  the  new  GCV  with  number  of  components  k_.  ,  corresponding  to 
the  j-th  mode  of  grouping,  j  =>  1,2,...  An  example  where  such  a  situation 
may  arise  is  provided  by  the  following  example  [which  corresponds  to 
in  (ii)]. 

Example: (Thurstone  and  Thurstone,  1941).  Horst,  Kettenring  and  Seal  (1964) 
discussed  this  example.  Three  (=k)  sets  of  scores  by  several  people  on 
three  batteries  of  three  tests  each  (group  size  =  3  for  each  group)  were 
obtained.  The  three  tests  in  each  battery  were  intended  to  measure, 
respectively,  the  verbal,  the  numerical  and  the  spatial  abilities  of  the 
persons  tested.  The  correlation  matrix  of  the  original  scores  was  given. 
Gnanadesikan  (1977)  commented  "An  interesting  alternative  analysis  in 
this  example  would  be  to  regroup  the  nine  variables  into  three  sets 
corresponding  to  the  three  abilities  measured  rather  than  the  three 
batteries  of  tests."  How  profitable  this  alternative  grouping  would  be 
in  terms  of  multidimensional  scatter  can  be  judged  by  the  test  for 
in  (ii)  above.  Note  that,  since  the  Yj  s  would  be  correlated,  such  tests 
may  not  be  very  simple,  even  if  the  GCVs  have  a  simple  distribution. 


3.  DERIVATION  OP  THE  NEW  GENERALIZED  CANONICAL  VARIABLES 


3.1  Notations 

- *  (Xj*  xk’  ) ,  =  <xn  •  •••'  xiPi) » i-1# •  •  • » 

k 

k,  I  p.  •=  p,  be  a  random  vector  variable  meaningfully  partitioned 
i-1  1 

into  k  disjoint  sub-vectors,  E(X)  denote  tbs-  expectation  of  X 

and  since  our  interest  lies  only  in  the  dispersion  of  _X<  assume  E(X)=> 
0.  Further,  let  I  (Z)  aM  •  I  ii)  \  denote  the  dispersion  matrix  and  the 
generalized  variance  of  t.v*  random  vector  Z,  respectively.  Then, 


Denote  the  sfc^  new  generalized  canonical  variable  by  (Y.  *s^ , 

Y^*^),  where  i3  the  coefficient  vector 

°f  h  at  the  stH  8tage’  Let,  a(s)'  -  (o1(s) ajJ8)').  Further, 

let  o'  «  f«*  *•  •  'Si' if}  denote  the  real  but  otherwise  arbitrary  coeffi¬ 

cient  vector  which  is  used  at  the  outset  to  derive  the  new  generalised 


canonical  variables  Y  at  any  stage  and  satisfy  the  condition  that 
!'  (Y)  is  «.qui- correlated ,  i.e.r 
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1  „  •  _  1  _ 

““ 

-1  zil-l  -1  El#2  **•  -1  EHrht 

1  p  . . .  p 

-2  E21— 1  -2  E2#2  -2  E2l£k 

p  1  . . .  p 

E(Y)  - 

• 

- 

•  •  • 

•  •  »  •  •  * 

Ekl-1  ^k  Ek22L2  *  *  *  \  Ekkrk 

p  p  ...  1 

—  — 

_  _ 

-  Ep  say 


where  p  denotes  the  equi-correlation  coefficient  for  Y^s,  i“l» _ ,k. 

(s)  v, 

Also,  pv  will  denote  the  sth  new  generalized  canonical  correlation 
coefficient. 

3.2  Derivations  of  the  new  generalized  canonical  variables. 

For  the  first  new  generalized  canonical  variable,  we  want  to 
minimize  {E^j  subject  to  scale  conditions,  that,  for  Y^s,  variances 
are  all  unity  and  they  are  equi-correlated.  We  use  the  method  of 
Lagrange's  undetermined  multiplier.  Let, 


*  "  ^P1  +  i  k(k-l)  ZiA"1)  +  1^jVij{iEjfjStiZijStj 

-  k(k-l)a!  E  a  } 

— i  ij— j 

The  last  term  introduces  the  condition  of  equi-correlation  of  the 

_  m  n 

Y  s  and  is  justified  by  the  result  that,  Z  -  E  E  Z  /mn  -  Z  for 
1  i-1  j-1  1J  8C 

all  s , t  iff  Z^  ■  Z^  for  all  i,j,k,l.  The  X^s  and  v^s  are  Lagrange's 

undertermlned  multipliers,  with  -  v^,  i^ j . 


3ct 


C(p) 


k(k-l) 


E_a,  +  2 


E„a 


J(J^t)  tj-j  ‘  k(k-l) 


E  E 

I 


...(t.l) 
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for  t  -  and  write  t+1  ■  k+1  as  1,  and  where  C(p)  ■  3 1 E  1/3  . 

P  P 

We  also  used  p  -  E  a, 'I  a,/k(k-l)  in  the  first  term  and  the  fact 
il»J  1J~^ 

that  O-'E.-O.  ■  alE-.a,  for  all  i,j  »  l,...,k  in  the  last  term. 

“1  ij~j  ij~ i 

Multiplying  (t*l)  by  a^,  subtracting  successive  equations  and  using 
condition  of  equi-correlation,  we  get 

{-  k(k-l)p)  "  0 


or 


{k(k-l)}~2(X  -X  )«p{  I  (vtl‘vt+H)  +  (vtt+l~Vlt)} 

jU^t.jjtt+l)  tJ  t+1J  tt+1  t+lt 


A  solution  to  the  set  of  equations  defined  by  (t.2),  t-l,...,k  is 
given  by 


vtj  *  vt+ij  for  a11  J’4**  t+1»  Vtt+l"Vt+lt  ft>r  a11  C  and 

Xfc  m  X^^  for  all  t,  t  *  1, . . .  ,k.  ...(3.2.1) 


Using  these  values  in  (t.l)  and  X  as  the  common  value  of  the  Xts  we 
get  the  simpler  equation,  [since  (3.2.1)  with  »v^  gives  v^'s 
all  equal] 


. . . (t. 3) 


Multiplying  the  above  equation  by  we  get,  using  the  conditions  of 
equi-correlation  and  variance  of  Y.  *  ”  1*  X  ■ -(k-l)pC(p) . 

t  "T  Ct  t 

Substituting  this  value  of  X  in  (t.3)  we  finally  get,  assuming  C(p)i*0. 


E  E.-Ct,  “  (k-DpE^a.  •  0  ...  . . .  (t.4) 
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In  writing  the  above  system  of  equations  explicitly,  we  have 


-(k-l)pE^  1^2  •' 

£21  “(k-Dnljj  • 


“Ik 


“2k 


-kl 


k2 


.-(k-DpEj^ 


2-1 

2-2 


2k 


...(1) 


A  nor.-trivial  solution  in  p  (which  is  necessary  for  a  solution  satis¬ 
fying  the  condition  that  variances  of  Y^s  are  all  unity)  exists  if 
and  only  if  the  coefficient  matrix  of  the  cx  is  singular.  Writing 
this  condition  in  a  compact  form  we  get. 


'rod“(k"1)prd!  =  0  cr  |I  -  X*zd|  -  0 
where 


od 


■■21 


12 

0 


• « .  I 


lk 


•  ■ .  I 


2k 


Zkl  Zk2 


see  0 


'  Ed" 


...(2) 


I 


■'ll  0  ...  0  | 

0  Z22  0 


0  •  •  • 


*kk 


and  A*  *  l+(k-l)p 


We  may  call  Z.  a  diagonal  super-matrix  (a  term  due  to  Horst)  and  £ 

«  zt. 

an  off-diagonal  super-matrix  wit):  respect  to  £. 

Kota  1.  The  solution  in  (2)  coincides  with  that  of  KcXeon's  and 
hence  to  that  of  various  other  authors  as  pointed  out  in  Chapter  I. 

The  criterion  and  derivation  were  of  course  different  here. 

2.  In  order  that  the  ps  obtained  from  (2)  satisfy  the  condition 
of  a  correlation  matrix,  we  must  have  that  -l/(k-l)  <  p  £  1.  Here  p  <  1 
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by  definition.  Next,  from  Theorem  1  in  Appendix  we  note  that  >*  >  0, 

since  £  is  positive  semi-definite  and  assume  that  E.  is  Dositive 

o 

definite,  [if  E.  is  positive  semi-definite,  then  at  least  cne  of 

U 

the  L^'s  oust  be  singular.  The  assumption  reasonably  requires 
that  at  least  within  each  group  we  do  not  have  any  redundant  vari¬ 
ables  -  those  which  bear  linear  dependence  with  other  elements  in 
the  same  group].  Hence,  p  >  . 

We  solve  the  cett rr.inantal  equations  in  (2)  for  the  p-roots  in 

* 

terms  of  s  and  then  by  the  one-one  relationship,  determine  the 

corresponding  p.s.  Since  our  object  is  to  minimize  }E  [we  choose 
1  P 


that  p.  for  which  J |  is  minimum. 

I!ow. 

~  |Z  |  -  -k(k-l)c(l-p)k“2  =  C(p). 

dp  p 

So,  |£pi+  for  p<0  *  r  for  r  '0 

Hence,  we  need  to  consider  both  positive  and  negative  and  then 
choose  the  one  say  c  ^  for  which  |lp|  is  minimum.  The  coefficient 
vector  corresponding  to  p  ^  is  then  determined  from  (1)  and  the  conditions 

of  equicorrelations  and  unit  variances  for  the  components  of  y/1^ .  Then,  the 
first  new  GCV  is  given  by  Y '  =  X^  *  *  (a^  ’X^,...,a^  where  <x^  ’  = 

', . . . ,a^1} ')  is  partitioned  in  a  fashion  corresponding  to  that  of  X. 

3.  He  had  previously  assumed  that  C(p)  ^  0,  which  means  p  /  0 
and  p  ¥  1.  From  (2),  note  that  if  Jl^jj  ¥  0,  then  p  ¥  0  and  if 
|E|  ¥  0  then  p  ¥  1  (and  p¥ ) . 


For  higher  order  new  generalized  canonical  variables .  the  con 


straints  are  next  discussed  and  then  the  derivation  is  given  below. 


In  case  of  canonical  correlation  analysis, i-e. ,  k=2,  p^^p2  at 


the  first  stage  one  seeks  the  pair  (Y^1^,  Y  =  Y^'  with  the 


maximum  possible  correlation.  The  procedure  may  be  pursued  further 


until  p^  pairs  of  canonical  variables  have  been  determined.  At 


the  sth  stage,  the  sth  pair  of  canonical  variables  Y^  are  found 


such  that  the  corresponding  sth  canonical  correlation,  is  the 


maximum  correlation  attainable  satisfying  the  conditions. 


corrtYj^^5  ,Y1(t))  =  corr(Y2(s),  Y2(t))  >=>  corr  (Yj. (s)  ,Y2  (t) ) 


»  corrtYj^^5  ,Y2(s>)  =  0,  t  =  l,...,s-l. 


Usually  the  first  two  conditions  are  sufficient,  since  they  imply 
the  latter  ones  and  sometimes  even  only  the  first.  Kettenring  has 
considered  in  detail  the  above  situations. 

For  GCVs,  the  same  approach  may  be  followed,  introducing 
suitable  restrictions  such  that  the  GCV  for  a  particular  stage  are 
distinct  from  those  of  the  previous  stages.  At  the  (r  +  l)th  stage 
we  invoke  the  restrictions, 

cor.(Yt^,  Yj  ^r+^)  ■  0,  i  =  l,...,r;  t,j  =  l,...,k 
which  is  equivalent  to  the  restrictions  above  for  k=2,  i.e,,  canoni¬ 
cal  variables.  This  type  of  restriction  has  been  demonstrated  by 
Kettenring  to  be  equivalent  to  the  restrictions 

corr. (Ft  ,Fj  ')  ■  0,  i  =  l,...,r>  t,j  ■  l,...,k 
where  F.  ^r+1*  are  the  best  fitting  factors  associated  with  the 
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(r+1)  stag*  fit  of  a  k  factor  model  like  Y  ■  E  £  F.  +e. 

(1)  j“l  ~ ^  ■* 

where  the  i.  '  are  arbitrary  non-null  vectors,  the  F.  ^  are  stan- 
■®3  3 

dardized  random  variables  and  e^is  a  vector  of  error  variables  - 
using  a  criterion  such  as  the  one  where  the  factors  are  determined 
so  that  F^1^  is  the  most  important,  the  second  most  important 

and  so  on.  He  has  also  pointed  out  interesting  relationships  of 
the  above  restrictions  with  the  MAXVAR  and  MINVAR  procedures  in  a 
factor  analytic  set-up. 

At  the  (r+l)^1  stage  using  the  above  restriction,  i.e.. 


m 

o'  I. .a.  *0  l  ■  l,...,r;  i,j  a  l,...,k;  or  equivalently 

—x  13— D 

or'  l .  .a,  ^-a\  Z.,a,  ***■  0  ”  tf.  E,.a.^,  we  have, 

-i  ^  ]i--i  -a  ij-g 


♦r+i  -  ♦  *  j 6ijs 


where  <J>  is  the  function  defined  in  section  3.2.  Hence,  for  t=l,...,k. 


H 

3a. 


— 1  «  +  EE  e.  .  E  a, 

9“t  js  tj# 


'“’-J1  Wu2i<*>+J't.!:tt2t 

is  8 


(S) 


0...  (:.Si 


it) 

Pre-multiplying  (t.5)  by  a.  ,  we  get 


(”tji“*eiti,p 


(*> 


+Cti-°- 


(t.6,-(t+1.6,  f[E(0tjt-  et+lj£)  « 


(t  .6) 

,k 
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A  solution  to  these  are  obtained  by  taking  0  g  all  equal,  cfcs 

all  equal.  But  from  (t.6)  this  neans  C  “  0  for  all  t,s, t“l, . . . ,k, 

ts 

s*l, . . . ,r, ;  and  using  these  values  for  6  .  and  c  in  (t.5)  we  get 

t  j  s  ts 

(t.l).  Thus  the  previous  arguments  are  applicable  and  we  get  (t.4) 

and  (1).  At  the  (r+l)*'*1  stage,  we  choose  that  root  of  (2)  say 

p<r+l)  which  gives  the  (r+l)**1  lowest  value  of  |E  I  and  call  it  the 

P 

(r+l)  new  generalized  canonical  correlation  and  the  variable  Y '  1  « 

(r+1) *  (r+l) ' 

(a  ,  X^  —  ,a^  X*)  obtained  from  a  corresponding  coefficient 

th 

vector  as  the  (r+l)  new  generalized  canonical  variable. 


3.3  Sample  new  generalized  canonical  variables: 

Since  in  practice  both  p  and  E,  the  population  mean  and  dispersion 

matrix  are  unknown,  we  need  to  estimate  new  GCVs  from  a  sample  of 

independent  observations  from  the  distribution  of  X,  To  simplify  matters 

we  now  need  to  assume  that  X  ^  N  (jj.EI  .  The  procedure  for  non-normal 

*"  P 

populations  may  be  pursued  as  done  for  canonical  correlations,  e.g.  as 
in  Rao  (1973).  Let  xU,  u  ■  l,...,n  be  a  random  sample  of  n  observations 
from  the  population  of  X  and  let  x  and  S  be  the  sample  mean  and  disper¬ 
sion  matrix.  Corresponding  to  the  partition  of  X,  partition  S.  Then  the 
MLE  of  E  and  E  are  S,.  and  S  respectively.  Now  since  {|i, (Z  ,E) ,A,A), 
where  A  -  diagtp  ai , . . . ,p i ,  A  -  , . . . ,o^p) ) ,  is  a  single-valued 

function  of  (P#E),  it  follows  from  Appendix  A* 2,  that  the  KLE  of 
is  given  by  the  corresponding  characteristic  root  r ^ ,  i  -  l,...,p  t 
of  SS^1.  Also  the  MLE  of  of**  is  given  by  a^*  satisfying  afV  *s  ^  ■  lr 
(SS^j1  -  ■  0.  Hence  we  get. 

Theorem  1  The  MLEs  of  the  population  non-zero  new  generalised  canonical 
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correlations  p^,...,p^,  q  being  the  number  of  non-zero  roots,  and  the 
corresponding  normalized  coefficient  vectors  a^,...,o^,  are  given  by 
respectively  the  ordered  roots  r^,...,r^  (i.e.  r^  gives  the  i^1 
minimum  value  of  |lr|  among  the  values  of  r,  r  =  r^*,  j  *  l,...,p)  of 
the  determinantal  equation  )s  -  (k-l)rS.j  *  0  and  a^1* , . . . ,a^  ,  where 


.(i)c  x  „U> 


(i)  f  e  -U) 


satisfy  (S-(k-l)r '  ■'S^,)  a'*'  -  0,  '  »  1. 


t(i)  .g  _  r  (i)  y  ^  y  y  y  ,  l,#*.,k. 


— u  uv-v 


i  — 


Note  that,  though  unlikely,  some  of  the  non-zero  pl  s  may  be 
known  to  be  equal.  The  estimates  have  to  be  modified  then.  Standard 
derivations  of  MLE  for  the  canonical  correlations  (e.g.  Giri  (1977) , whose 
approach  has  been  followed  here)  do  not  consider  such  cases.  However, 
Anderson  (1963)  has  given  MLEs  of  characteristic  roots  in  case  of  known 
multiple  population  roots  for  principal  component  analysis.  The  questions 
of  under  what  situations  some  p^s  are  equal  and  how  to  estimate  them  in 
those  events  seem  to  be  interesting  problems  for  further  investigation. 
Also  note  that  in  the  above  derivation  of  MLEs  we  have  used  the  Principle 
of  Invariance  of  MLEs  for  non  one-one  functions  advocated  by  Zehna(1966). 
This  principle  is  stated  in  section  A. 2  of  the  Appendix. 


3 .4  The  Singular  Case: 

Very  often,  when  the  vector  variable  under  consideration  has  a  large 
number  of  conponents,  the  dispersion  matrix  for  the  variable  turns  out  to 
be  singular.  One  is  then  faced  with  the  problem  of  defining  the  generalized 
correlations  and  associated  variables.  In  particular,  for  the  generalized 
canonical  correlations  the  following  result  overcomes  that  difficulty. 
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Theorem  4.4.1  The  generalized  canonical  correlations,  for  the  new 
method,  are  given  by  p  *  (p*-l)/(k-l)  where  p*8  ate  the  non-zero 
roots  of  |  "P*1!  “  °»  being  any  g-inverse  of 

Proof:  See  proof  of  Theorem  1.  Sen  Gupta  (1980). 

Various  other  generalizations  of  canonical  correlations  are  properly 
defined  for  the  singular  case  by  the  application  of  the  sauce  Theorem  1 
in  Sen  Gupta  (1980) . 


4.  STATISTICAL  INFERENCE  ASSOCIATED  WITH  A  SINGLE  SET  OF 
NEW  GENERALIZED  CANONICAL  VARIABLES 

4.1  Distribution  of  the  new  generalized  canonical  variables; 

The  exact  distribution  of  even  the  canonical  variables  with  the 

special  case  of  a  multivariate  normal  population  is  complicated  and  is  of 

doubtful  practical  importance  (see  e.g.  Kshirsagar,  1972) .  The  situation 

here  is  even  more  complex  and  so  only  the  large  sample  distribution  will 

be  studied  here.  The  coefficient  vector  a?S^ ,  induced  MLE  of  afS^ ,  for  X. 

-a.  — 1  — i 

with  s  =  1  will  be  of  particular  importance,  i  =  1, — ,k.  To  simplify 

matters , . we  further  assume  that  the  population  is  multivariate  normal 

because  of  the  well  known  properties  corresponding  to  linear  functions  of 

the  components  and  the  generalized  variance  of  a  random  vector  having  a 

multivariate  normal  distribution.  Since  the  estimated  coefficient  vector 
(s)  . 

a  'is  obtained  by  the  principle  of  induced  MLE,  properties  of  such 
i 

estimators  need  to  be  considered.  The  asymptotic  efficiency  of  induced 
MLEs  under  certain  regularity  conditions  is  implicitly  given  in  the  proof 

A  ' 

by  Zacks  (1971)  in  Theorem  5.4.2  and  there  it  is  shown  that  g(9)  can  be 
taken  as  an  induced  MLE  for  g(0)  when  g(0)  is  a  non  one-one  function. 
However,  the  case  of  the  new  GCVs  is  more  complicated.  We  now  have  6  as  a 
k(k-l)/2  -  component  vector  instead  of  a  scalar  and  £(0)  in  addition  to 
being  non  one-one  can  be  scalar  (for  p)  or  a  vector  (for  the  coefficient 
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vector  a) .  However  the  induced  MLEs  p ^  ^  and  a  of  and  a_ 

respectively, are  expected  to  be  good  approximations  to  corresponding 

/ 

population  values  under  suitable  comparisons . Obe  validity  of  this  appr- 
oxiaation  is  shown  below.  He  thus  treat  a  ,  in  large  samples, as  the 
true  coefficient  vector  itself  and  consider  it  as  a  vector  of  constants 
and  for  clarity  write  it  as  £lsJ .  £(1)  will  be  written  as  £-  Assuming 
that  X  --  Np(0,I),  then  Y'-  (£^,...,6^.)  -  (Y^...,*^  ~  **^(0,1  UJ>, 

in  large  samples . 

Assume  that  non-zero  roots  are  all  distinct.  Then  from  a  theorem 
due  to  Kurwitz  (see  Appendix)  it  follows  that  there  exists  a  6  such  that 
there  will  be  exactly  the  same  number  of  non-zero  eigen  values  for  A+E 
as  for  A  when  j  |sj  \  <6,  where  | |  | |  is  a  matrix-norm.  Let  A*  have  the 
same  rank  among  the  eigen  values  of  A+E  as  X  has  among  those  of  A.  To 
make  them  unique,  normalize  the  corresposding  eigen  vectors  such  that  the 
first  non-zero  component  of  each  is  positive. 

*(1) 

Consider  without  loss  of  generality  the  eigen-value  A-*  of 

IE.1  (corresponding  to  P ^ )  and  the  corresponding  MLE,  say  g.  By  a 
a 

standard  theorem  in  complex  analysis  g  is  a  continuous  function  of  (S,S^), 

because  g  is  a  continuous  function  of  the  coefficients  obtained  from  the 

equation  |s  -  gS^|  *  0  and  in  turn  these  coefficients  are  continuous 

functions  of  (S,S.J .  The  coefficient  vector  a(1)  corresponding  to  g, 
a 

obtained  by  continuous  operations  on  g  and  elements  of  S  and  S^,  is  also 
a  continuous  function  of  (S,S^).  But  (S,S^)$  (E,E^)  and  so  a^5  ,i.e. 

a^  is  a  consistent  estimator  of  and  so  in  large  samples ,  the 
approximation  considered  above  is  meaningful. 


4.2  Tests  for  equi-correlation  coefficient  p^  s 

The  value  of  p^spby  itself,  may  be  of  interest.  It  may  also  be 

compared  with  the  first  canonical  correlation  when  a  sub-division  of  X 

into  two  groups  also  seems  meaningful.  Suppose  — ^ '  ■  *  *  constitute  an 

independent  sample  from  Np CO, El  and  let  Y^  be  the  sample  first  new  GCV. . 

Then  from  section  4.1,  in  large  samples  ^  .  Let  I  and  E  be  the 

identity  matrix  and  the  matrix  with  all  elements  equal  to  unity.  Then, 

E  -  Cl-P)l+PEj  I"1  -  Cl-  P)_1I-pCl-P)_i{l+(lc-l)or1E-  Cc..>,  where 
P  P  J 

c..  =  {l+(k-2)p}/Cl-P){l+(h-l)p}  and  c..  -  -p/(l-p) {l+(k-l)p>,  i  +  j. 
xi  *j 

Hence  the  density  function  for  non-singular  E  can  be  written  as, 

.  1  .  (Z  yi>  11  yi>2<-p> 

"  w)*/2\z  \1/2  6X5  ["  ?  {  (1-p)  +  U+(k-l)p)  Cl-P)}3"  (4*2,1) 

-«<y^<«,  i  *  l,...,k  and  p  j*  1,  /  -l/(k-l)  . 

The  above  representation  is  particularly  useful  because  it  shows  that, 

Cil  there  does  not  exist  any  one-dimensional  sufficient  statistic  for  p 
_  2  — 

(ii)  E  (y^-  y)  and  y  **  E  y^/k  are  independent  and 

(iii)  the  part  of  the  exponent  within  the  second  bracket  is  monotonically 
decreasing  in  p  with  positive  probability. 

We  want  to  test  HQ:  p  ■  pQ  against  H^t  p< (>)  pQ  or  against  p  /  pQ. 

Likelihood  ratio  test;  For  testing  against  the  LRT  is  derived 

#1) 

below.  We  also  write  p  -  p  for  simplicity,  as  stated  in  the  outset 
and  also  suppose  the  original  sample  of  size  n  is  split  into  m  indepen¬ 
dent  subsamples,  giving  rise  to  m  independent  first  new-  GCVs.  The  LRT  can 
be  performed  even  if  m  *  1.  The  likelihood  function  can  be  written  from 
U>  above  easily  and  differentiating  this  with  respect  to  p  and  equating 


the  derivative  to  zero  we  have, 


0 


g(p)  =  (k-l)kp (1-p) {l+(k-l)p}+EE  y? .  {l+(k-l)p]&E (E  y  ?{l+(k-l)p2} 

ji  3  j  i  3 


where  jr’  =  (Y^ ^ , . . . ,y^)  is  the  sample  first  new  GCV  obtained  from  the 
j**1  sub-sample,  j  =  l,...,m.  This  is  a  cubic  equation  in  p,  two  of  whose 


roots  may  be  complex.  Now,  g{-l/(k-l)},  g(l)  and  g(0)  are  all  positive 
(with  probability  one).  Thus,  it  is  not  very  obvious  from  g(p)  that  an 


admissible  solution  of  p,  say  p,  where  -1/ (k-l)<p<l,  will  always  exist.  [However, 


in  various  cases  admissible  solutions  may  exist,  e.g.  if  EEy_<l/2,  and 


E(£y„)  <1/2  then  g(-l/k)  is  negative.  Hence,  an  admissible  solution  here 


lies  in  (-1/ (k-1) ,-1/k] . ]  However,  considering  the  likelihood  function  directly 
it  can  be  shown  that  there  exist  at  least  one  such  admissible  solution.  Further, 
in  cases  of  several  admissible  solutions  (at  most  three) ,  by  principle  of 
Maximum  Likelihood,  we  choose  as  the  MLE  of  p,  that  which  corresponds  to  the 
largest  value  of  the  Likelihood  function  and  call  it  p.  Thus  we  get  the  following 


Theorem  It  Let  Y  N,  (0,E  ).  Then  if  Y_  , . . .  ,Y  constitute  an  independent 

k  p  1  “in 

random  sample  from  the  above  population,  the  Likelihood  Ratio  Test  for 

testing  H^:  p=pQ  against  the  alternative  p /Pq*  is  given  by 

m/2  .  - 

Reject  HQ  iff  X  =  (lE«|/|Ep  |>  expC~  {atf^p^-f^p)  }+b{f2(pQ)-f2  (p)  }}] 

<  K,  a  constant 

*  2  2 
where  p  is  the  MLE  of  p,  a  «  EEy  ,  b  =*  E (E  y  )  ,  f  (p)  «  l/(l-p)  , 

ji  13  j  i  13 

f2(p)  *  -p/{l+(k-l)p}(l-p)  and  K  is  a  constant  to  be  determined  so  that 

the  level  of  the  test  meets  the  specified  value. 

2 

Under  H  ,  for  large  m,  -2  lnX  \  x,  ^  *  . 
o  l  a. t . 

A 

The  exact  distribution  of  p  and  the  LRT  seem  quite  complicated. 

Due  to  the  above  difficulties  in  LRT  a  new  test  based  on  the  Best 
Unbiased  Estimator  of  p  has  been  considered  in  Sen  Gupta  (1981b) .  The 
exact  null  and  non-null  distribution  of  the  test  statistic  has  been 


obtained  in  terms  of  Rummer's  functicn  and  the  test  has  been  shown  to 


unbiased  against  one-sided  alternatives.  The  distribution  in  (5.2.1) 
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4.3  Tests  for  a  specified  value  of  the  generalized  variance  of  the  new 
generalized  canonical  variables. 

Since  the  criterion  for  optimization  is  the  generalized  variance. 


a  test  for  generalized  variance  is  needed  here  to  judge  the  performance 


of  the  new  GCVs.  The  Likelihood  Ratio  Teat  for  generalized  variance,  for 

the  special  structure  of  the  dispersion  matrix  here,  admists  of  some 

2 

simplifications.  If,  j Z  [  -  a ^  <  1,  then  there  are  precisely  two  real 
distinct  solutions,  p^  >  0  >  p  ,  say. 

Then  (using  MLE  of  p  with  n  ■  1  in  Section  4.2)  we  have,  by  an 
application  of  the  results  in  Sen  Gupta  (1981b), 

2 

Theorem  1.  The  Likelihood  Ratio  Test  for  Hq j | Z ^ |  ■  Oq  against 
Hl:  Upl  *  o20  is  given  by. 

Reject  Hq  iff  X  -  ( |£~  | /aQ)k/2  exp{-^  Y  ’  Of^  -  EI^Y)  <  C 
where  p  is  MLE  of  p,  p*  is  such  that,  fOT.,p*)  **  max  f(Y  ;p)  and  C  is  a 

VP2 

constant  to  be  determined  such  that  the  test  has  the  desired  level. 


If  the  original  sample  is  split  into  m  independent  subsamples, 
so  that  Y. ,  j  ■  l,...,m  are  new  GCVs  based  on  the  j*"*1  subsample,  then. 

Reject  Hq  iff  Xm  -  ( | Z^ i /aQ)mlc/ 2  exp{-%  (Z^*  -  Z^1)^)  <  C' 
m  m 

where  p**  is  such  that  II  f(Y  ,p**)  ■  max  II  f(Y.»p),  p  is  the  MLE  of  p 

1  pl,p2  1 

and  under  H  ,  for  large  m,  -21n  X  *'>  x?« 

u  mi 

For  other  interesting  tests  for  the  generalized  variance,  |£^| 


see  Sen  Gupta  (1981b) . 


5.  COMPARISON  OF  SEVERAL  NEW  GENERALIZED  CANONICAL  VARIABLES 
5.1  Tests  for  equality  of  multidimensional  scatter  of  two  new  GCVs: 

As  in  Section  4  ve  will  consider  only  the  first  new  GCV.  The  prob¬ 
lems  that  arise  when  the  random  variable  X  is  grouped  in  two  different 
ways  are  related  to  (a)  same  number  of  groups  with  same  number  of  elements 
in  each  (b)  same  number  of  groups  with  different  number  of  elements  and 
(c)  different  number  of  groups.  Since  (a)  and  (b)  give  rise  to  same 
number  of  components  in  the  corresponding  first  new  GCVs,  for  testing 
purposes  they  can  be  tackled  in  a  similar  manner. 

A  random  sample  of  size  n,  X, ,...,X  ,  is  taken  from  N  (0,1).  Let 

— 1  — n  p  — 

Y  and  be  first  new  GCVs  of  order  r  and  r-1  respectively.  Under  the 

i  .  *  .  • 

approximations  of  Section  4.1  construct  Y.  *  (8  ,X*  X  .)  and 

“i  - Yl— li  — \r— ri 

^  -  <£zi-li’  •  *  •  ’£zr-l^-li),for  each  - 1 •  1-1 . n  where  (^1 . 

and  X'=  (X**i • • •  ,2^*^)  are  the  partitions  of  X  into  orders  r  and  r-1 

corresponding  to  Y  and  Z  respectively.  Then  (Y  , — ,Y  )  and  (Z. ,...,Z  ) 

1  n  1  n 

can  be  considered,  each,  separately  as  a  sample  of  GCVs  corresponding  to 

Y  and  Z  respectively.  But,  Y^  and  Z^  will  still  be  dependent.  This 

dependency  can  be  avoided  if  a  sub-sample  X  ,...,X  of  size  m  is  used  to 

■L  ~ln 

estimate  Z  and  then  obtain  Y. ,  i  ■  l,...,m,  from  this  estimate  and  sub- 

— l 

sample  and  the  remaining  sample,  Xj »  J  “  mfl,...,n  is  used  to  give  an 
independent  estimate  of  £  and  obtain  Z. ,  j  ■  l,...,(n-m).  A  LRT  can  then 
be  performed  for  the  equality  of  the  generalized  variances  of  Y  and  Z, 
using  advantageously  the  Independence  of  Y^  and  Z^a.  The  same  method  can 
be  used  for  more  than  two  new  GCVs.  However,  this  procedure  will  be  quite 
inefficient  because  Z  will  be  estimated  on  small  sample  sizes  and  so  will 
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of  course  be  the  coefficient  vectors.  Further,  the  number  of  Y.s  and  Z.s 

-x  — 1 

will  be  greatly  reduced.  A  modified  method  which  uses  the  same  number  of 
GCVs  Y.s  and  Z.s  as  that  of  the  X.s,  namely  n,  and  where  E  is  estimated 
using  the  entire  sample  of  size  n  is  proposed  below.  The  method  is  also 
applicable  in  the  case  of  several  new  GCVs, 


Lemma  1:  If  X,,...fX  are  i.i.d.  and  Y,...,Y  are  i.i.d.,  X. is  independent 
-  —1'  — n  —1  -n  — i 

of  V. ,  i  /  j,  and  X^  and  Y^s  have  multivariate  normal  distributions,  then 

X  =  Za.X.  and  Y  =*  Ib.Y.  are  independent  if  E  a.b.  =  0, 

—  i—i  —  l—i  ii 

Proof.  Cov  CX ,Y)  =  El  a.b .Cov(X. ,Y . }  =  E  a.b„Cov(X. ,Y.)  =  Cov(X,Y)  ,0  «  0 

■  —  —  l  ]  —l  — 3  ii  —l  —l  —  — 

and  hence  the  lemma  follows  from  the  normality  of  X  and  Y. 
r  t  n  r-1  *  n 

Let  Y  =1  Y.  ,  Y  «  E  a.Y.;  Z.  =E  Z.  ,  Z  =E  b.Z. 


3  s-1  3S 


1  3  3  3  S=1  3s  •  x 


Choose  a.,b.  such  that  E  a.b.  =  o.  Also  if  Y  N  (0,E  ),  Z  N  ,(0,2-  ) 

3  3  j  3  -  r  -  p  r-1  Pr_! 

*  _  *  — — 

then,  ”ar(Y  )  =  r  (1+r-lp  )  and  Var  (Z  )  =  (r-1)  (l+r-2p  1  ,  where  we 

r  r-i 

2  2 

have  chosen  a.,  c .  such  that  E  a.  =  I  b,  =  1.  Call  the  resulting  variables 
j  j  3  3 

•  *  *  * 

Y  and  Z  1  Using  the  approximations  of  section  4.1  and  by  Lemma  1,  in 
★  *  *★ 

large  samples  Y  and  Z  are  independently  distributed. 


We  want  to  test  H„:  a,  =  a„  where  o.  =  E 
0  12  1  p 


a  =  |E 


1 1/ (r-1) 


against  H  :a  <  a  or  H  :a  ?  a.  .  For  testing  H 

XI  4  *  X  4  •  V 


against  H2,  we  use  LRT.  Differentiating  the  Likelihood  function  with 
respect  to  p^  and  p^  ^  and  equating  the  derivatives  to  zero,  we  have, 

y**2  -  r  (l+r-lpr)  and  z**2  -  (r-1)  (l+r^jfp^)  (estimate  of  variances) 

**  **  **  ** 

Then,  under  Q  ,  maximum  of  the  Likelihood  function  L(y  ,z  )  is  l/(y  z  ) 


Under  H^,  we  have  to  maximize  the  Likelihood  function  and  then  consider, 
*  **  **  **  ** 

L  (y  ,z  )  »  L(y  ,z  )+  A(o^  -  o2)  ,  where  A  is  Lagrange's  multiplier. 

The  solution  seems  to  need  numerical  or  iterative  methods.  With  these 

**  **  **  ** 
solutions  ,  let  Max  L  be  f(y  ,z  \  a  function  of  y  and  z  . 


**  **  **  *  * 

Reject  Hq  iff  n  »  f(y  ,z  )/(y  z  )  <  nQ,  a  constant. 

2 

Under  H  ,  -2  In  n  is  approximately  distributed  as  a  x  variable  with  1  d.f. 
o 

2  2 

So  far,  except  the  conditions  (i)  E a^  =  E  b^  *  1  and  (ii)  Ea.b^  *  o, 

a^  are  arbitrary.  Since  the  statistic  in  LRT  looks  formidable,  the 

power  function  of  the  above  test  may  be  difficult  to  compute.  However,  we 

may  try  to  improve  on  the  estimators,  ck,  i  =  r-l,r-2,  obtained  above 
**  *  * 

when  L(y  ,z  )  was  maximized  under  ft,  are  unbiased  estimates.  We  will  try 

to  find  minimum  variance  unbiased  estimators  among  the  above  class.  This 

4 

amounts  to  minimizing  Ea.  and  in  view  of  condition  (i)  above,  subject  to 

2  2 
E  a  *  1.  The  solution  is  easily  found  to  be  =  1/n  ,i  =  l,...,n. 
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5-2  Tests  for  equality  of  multidimensional  scatter  of  more  than  two  new 
generalized  canonical  variables: 

Here  we  will  consider  separately  GCVs  with  same  and  different  number 
of  components,  because  some  simplifications  are  available  in  the  former  case. 

(a)  GCVs  with  same  number  of  components  and  with  equi-correlations  all  >  0  or 
all  <  0:  Consider  the  case  of  s  (say,  even)  GCVs  with  same  number  of  com¬ 

ponents  r  and  equi-correlation  coefficients  p  f . .i *  1, . . . ,s  all  >0  or  all  <0. 

Take  s/2  independent  sub-samples  of  same  size  and  construct  (y**,yi*), . . .  ,(y**. ,y**) 

1  /  s— X  s 

as  mentioned  in  section  5.1.  Thus  y**s  are  independent  normal  variables.  Then, 
the  Likelihood  function  is  given  by  L,  where,  writing  y^y**. 

In  L  =  C  -  %  E  lnCrd+r-lp^)} +y^/{r(l+r-lpi>} ,  where  C  =*  (2tt)S/^  . 

Then,  in  the  notations  of  Barlow  et  al  (1972)  letting, 

X- {l, . . .  ,s)  ,  g(i)  =  y^,  f  (i)  »  (r(l+r-lpi)}  and  w(i)  =  1,  i*l,...,s 

we  have  the 

Lemma  1.  The  Isotonic  regression  g*  maximizes  L  under  H^:  The  generalized 

*.■)_  f-U 

variance  of  i  GCV  is  greater  than  that  of  the  j  GCV,  i>j . 

Proof;  Note  first  that  since  equi-correlations  are  all  <0  (all  >0)  and 

ordering  of  the  generalized  variances  of  the  new  GCVs  gives  the  same  (reverse) 

ordering  of  the  corresponding  equi-correlation  coefficients.  With  the  above 

values  of  X,  g(i),  f(i)  and  w(i),  we  use  Theorem  1.10  of  Barlow  et  al  quoted 

in  our  Appendix  and  the  relevant  notations.  Let  $(u)  * -In  u.  To  complete 

the  proof  it  suffices  to  note  that  maximizing  In L  is  equivalent  to  minimizing 

E[log  f (i)  + (g(i)/f ( i ) } ]  both  subject  to  f  isotonic,  since  the  first  and  last 
i 

terms  of  do  not  involve  f.  (When  > 0  Vi,  istonicity  is  w.r.t.  reverse  of 
the  natural  order  on  X  and  for  computational  purposes  a  formula  is 

c  2 

g*  ■  min  max  £  y 
s<i  t>i  j ”8  j 

When  p^  <  0  Vi,  min  and  max  are  interchanged  in  the  above  formula.) 


Using  the  £act  that  under  HQ  Max  L  is  obtained  by  simply 
2 

replacing  l+(r-l)p^  by  yi  in  L  and  Lemma  1,  the  LRT  statistic  X  can  be 

easily  found. However,  even  under  HQ  the  distribution  of  the  statistic 

_2 

seems  to  be  complicated.  In  large  sample,  -2  lnX  'v  x  (see  Barlow  p.  198) . 

(b)  GCVs  with  different  number  of  components;  In  view  of  the  above 

difficulties  even  in  the  special  cases  as  in  (a) ,we  propose  to  use  the 

approximation  to  the  distribution  of  the  generalized  variance  by  Gamma 

distribution  by  Hoel  (1937).  Barlow  et  al.  (p.  198)  has  considered  a 

similar  case  and  with  a  slight  modification  it  follows  that  the  LRT 

—2 

statistic  ,  -2  lnX  *v>  x  •  We  reject  the  hypothesis  of  equal  generalized 

variances  in  favor  of  a  given  ordered  relationship  among  them,  if  -2  lnX 

-2 

exceeds  the  upper  a  per-cent  point  of  the  distribution  of  x  •  Some  tables 
jure  given  in  Barlow  et  al.  For  this  test,  however,  see  comments  in 


Section  7. 


6.  DISCUSSIONS  AND  AN  EXAMPLE 


In  this  chapter  new  GCVs  are  found  for  the  classical  example  due 
to  Thurstone  and  Thurstone  mentioned  in  Section  2.3  and  studied  by  Horst 
Kettenring,  Seal  and  Gnanadesikan.The  correlation  matrix,  R,  for  the  9 
variables  we  consider  here  was  given  by, 

.636  .126  .059  .626  .195  .059 

I  -.021  .633  .049  .035  .459  .129 

.016  .157  .521  .048  .238  .426 

.709  .050-. 002 
I  .039  .532  .190 

.067  .258  .299 

I 

where  I  is  the  3x3  identity  matrix. 

Here  k  >3  and  p  *  9.  Since  R^  *  Z  (  the  9x9  identity  matrix  ) 

we  need  only  to  solve  |r  -  A  ij  ■  0,  i.e.  determine  the  characteristic 

A  ^ 

roots,  say  A^  of  R  and  then  find  «  (A^  -  l)/2,  i  «  1,...,9.  The 

coefficient  vector  for  the  8th  new  GCV  is  the  characteristic  vector 

'■'*  (s) 

corresponding  to  A  ,  say,  where  the  corresponding  equi-correlation 
coefficient  p*8)  is  the  one  which  gives  the  8th  minimum  value  among 
(1  +  2pi) (1  -  Pi)2,  i  -  1,...,9.,  s  -  1,.. . ,9. 
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Table  1  exhibits  the  order  of  the  first  four  new  GCVs  and  the 
corresponding  generalized  variances.  He  note  that  (1)  The  new  GCCs 
are  computed  quite  easily  e.g.  r 1 '  *  .745.  For  the  GENVAR  method 
and  several  others  discussed  by  Steel  and  Kettenring,  however,  both  the 
GCCs  and  the  coefficients  of  the  GCVs  for  each  stage  need  to  be  computed 
through  extensive  iterative  methods.  (2)  For  a  negative  value  of  r, 
i.e.  r  »  -.385,  we  have  the  third  new  GCV  and  (3)  Since  |E  |  for  the 
third  new  GCV  is  as  high  as  .45,  it  seems  that  the  higher  order  new  GCVs 
are  redundant  from  practical  considerations. 

The  computational  aspect  of  the  new  GCVs  is  quite  interesting.  At 
the  first  stage,  for  the  coefficients  aP ,  one  needs  to  solve  a  simultan¬ 
eous  system  of  non-linear  equations 


a.(1),S  a!11  -  1,  aU)'s  aP-  r(1),  i  *  j,  i,j=l . k 

— i  xi- x  — i  i  j — 3 

where  r^  has  already  been  obtained  (as  above)  from  (2)  of  p.20.  S  can 
be  first  tranformed  to  a  (block)  correlation  matrix  R,  as  done  by  Horst  for 
our  above  example.  The  conditions  then  become 


(1)  •  (1)  .  .  u;  t  .UJ  i  *  -i  <  -i.i 

Si  Si  Si  RijSj  -r  ■  (Si  -Si  »u  ).  1  ,  3. 

Further,  a  polar  transformation  on  b^  reduces  by  one  the  number  of 
unknown  variables  in  each  set.  The  usual  Gauss-Seidel  method  which  determines 
one  variable  from  each  equation  needs  to  be  modified  for  our  purpose.  Suppose 
an  initial  value  b(1*of  b(1*  is  given.  Usually  a  good  initial  value  is 
bP-  ‘L1)/bf1>!bP,  i-l,...,k  where  b(1)  ’  -  (b|1> ' , . . .  ,g1)  ' )  is  the  eigen 

o—l  —i—i  /u  —i  n\  -k 

vector  of  R  corresponding  to  r  .We  solve  for  b^  from 

o^1>,Rl£21>-  rU>  giVing  1^'  f°r  —3^  frOT  lfi51>*23^1)  “  r<1) 

giving  ,  and  so  on  till  finally  for  from 

-  r*1*  giving  jb^  .  ^b^1*  s  are  standardized  to  have  unit 


<1>  U)  ,JD'  .(!)■„  , 
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length,  yielding  the  coefficient  vector  at  the  end  of  the  first  iteration 
as,  using  the  same  rotation,  from  b^.  The  procedure  is  then 

further  iterated.  Since  r  ^  is  known  previously,  the  iteration  can 
be  terminated  at  the  n-th  stage  if 


k 


l 

i-1 


b!l> 
n— l+l 


<  e 


for  some  pre-assigned  e  and  the  suffix  k+1  is  replaced  by  1.  It  is 
reasonable  to  make  a  preliminary  polar  transformation  on  b^1^  »  each 
separately  for  the  i-th  set,  i-l,...,k,  as  suggested  earlier.  The  entire 
procedure  outlined  above  can  then  be  performed  in  terms  of  the  transformed 
variables.  The  ZXMIN  subroutine  in  IMSL  can  then  be  advantageously  ex¬ 
ploited  to  solve  for  b ^ .  This  procedure  is  applicable  to  coefficients 

(2)  (3) 

for  higher  stage  new  GCVs  also,  b^  ,b  , ...  -  only  the  equations  defining 
the  additional  constraints  at  the  corresponding  stages  have  now  to  be  also 
solved  simultaneously. 

For  our  example,  a  single  iteration  of  the  above  procedure  yielded 


sT 


(.642691 

.619972 

.450091) 

(.543281 

.733599 

.408263) 

(.668885 

.663320 

.335559) 

and 

1bf1)  -  .745 

h(1)* 

“  .748, 

ti11' 

>(1)  - 


.749, 


the  value  of  r^  being  .745  as  determined  at  the  outset. 

It  is  seen  from  this  example  that  unlike  several  of  the  more  reasonable 
methods  studied  in  details  by  Kettenring,  the  new  GCVs  are  quite  straight¬ 
forward  and  easy  to  compute.  We  have  to  merely  find  the  characteristic 
roots  and  the  standardized  eigen  vectors  of  a  given  matrix.  From  the 
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computational  point  of  view  the  gain  is  significant.  Also,  we  have  already 
seen  from  Sections  4  and  5  several  advantages  of  new  GCVs  from  the 
point  of  view  of  statistical  inference. 


Table  1.  j I  ( s ) |  for  ordered  new  GCVs 


X* 

r<8) 

|Er(8)i 

new  GCV 

2.48985 

.745 

.16 

first  stage 

2.16422 

.582 

.33 

second  stage 

.23548 

-.385 

.45 

third  stage 

1.61986 

.310 

.77 

fourth  stage 

7.  TOPICS  FOR  FUTURE  RESEARCH 


The  tests  based  on  the  approximation  to  the  distribution  of  the 
generalized  variance  by  a  gamma  distribution  may ,with  further  re search, be 
possibly  improved  upon  since  this  approximation  does  not  take  into  account 
the  additional  information  of  equi-correlation  structure  of  the  dispersion 
matrix.  This  leads  to  the  estimation  and  exact  distributional  problems 
associated  with  the  generalized  variance  corresponding  to  an  equi-correla- 
ted  dispersion  matrix.  Hence,  new  directions  need  to  be  sought  for  these 
problems  and  once  the  solutions  are  obtained,  problems  of  statistical 
inference  presented  in  this  report  can  be  considered  with  these 
modifications. A  simulation  study  comparing  the  modified  methods  with  the 
present  one  will  be  worth  attempting.  Properties  of  the  proposed  tests 
may  also  be  considered  in  more  details. 

Relationships  of  GCVs  with  such  important  fields  as  Time  Series 
Analysis,  Regression  Analysis,  Prediction  Theory,  MANOVA,  Discriminant 
Analysis,  and  Scaling  and  Factor  Analysis  are  topics  for  future  research. 
These  are  expected  to  achieve  simplifications  in  terms  of  cost  and 
analysis.  The  case  with  nominal  variables  (used  extensively  in  Bio-? 

Medical  sciences)  and  a  Ranking  and  Selection  Procedure  based  on  GCVs  are 
interesting  problems  for  further  investigations. 

Statistical  inference  associated  with  the  previous  iGCVs  and 
generalized  .variances  in  general,  are  important  topics  for  future  research. 
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For  LRTs  for  generalized  variances  and  their  associated  properties  see 
Sen  Gupta  (1981c) .  Such  tests  are  applicable  to  generalized  canonical 
varibles  obtained  by  minimizing  the  generalized  variance,  of  general 
structure,  as  considered  by  Anderson  (1958) ,  Kettenring  (1971) ,  steel 
(1951)  and  others. 
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APPENDIX 


A.l  Theorem  1:  Let  A  be  positive  definite  and  B  be  positive  semi-definite. 
Then  the  roots  of  B  in  the  metric  of  A  are  non-negative  (where  the  roots 
of  the  equation  | Aa  -  B |  *  0  are  called  the  characteristic  roots  of  B  in 
the  metric  of  A) .  If  B  is  also  positive  definite,  then  such  roots  are 
positive. 

Proof;  See  Proposition  17,  pages  581-582  of  Dhrymes  (1970). 

A. 2  Theorem  2:  (Principle  of  Invariance  of  MLE)  Let  6eft  (an  interval  in  a 
k- dimensional  Euclidean  space)  and  let  L(0)  denote  the  Likelihood  funct- 

A 

ion  -  a  mapping  from  ft  to  the  real  line  R.  Assume  that  the  MLE  9  of  6 

A  A 

exists  so  that  6eft  and  L(8)>  L(0)  for  all  0eft  .  Let  f  be  an  arbitrary 
transformation  mapping  ft  to  0*  (an  interval  in  an  r-dimensional  Euclidean 


space,  l<r<k) .  Then  f(6)  is  a  maximum  induced  likelihood  estimator  of  f  (9).- 

A  K  S 

Proof;  See  Zehna  (1966) . 

A. 3  Theorem  3;  (Theorem  of  Hurwitz)  Let  (f_(x))  be  a  sequence  of  analytic 
-  n 

functions  regular  in  a  region  G,  and  let  this  sequence  be  uniformly 
convergent  in  every  closed  subset  of  G.  Suppose  the  analytic  function 
^m  f  (x)  *  f(x)  does  not  vanish  identically.  Then  if  x  -  a  is  a  zero  of 
f (x)  of  order  k,  a  neighbourhood  |x  -  a|<6  of  x  *  a  and  a  number  N  exist 
such  that  if  n>N,  ffl (x)  has  exactly  k  zeros  in  jx  -  a|<4. 

Proof;  See  page  22  of  Szego  (1939). 

A. 4  Definition  (Isotonic  regression):  Let  X  be  the  finite  set  {x^,...,^} 


with  the  simple  order  x^x^f. . ^(X^.  A  real  valued  function  f  on  X  is 
isotonic  if  x,y*X  and  My  imply  f(x)<f(y).  Let  g  be  a  given  function  on 
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X  and  w  a  given  positive  function  on  X.  An  isotonic  function  g*  on  X  is 

an  isotonic  regression  of  g  with  weights  w  with  respect  to  the  simple 

ordering  if  it  minimizes  in  the  class  of  isotonic  function 

2 

f  on  X  the  sum  I  [g(x)  -  f (x) ]  w(x)  .  When  the  weight  function  and  the 
xeX 

simple  ordering  are  understood,  we  call  g  simply  an  isotonic  regression 
of  g. 

Note:  A  binary  relation  "3"  on  X  establishes  a  'simple  order'  on  X  if 

1.  it  is  reflexive:  x*x  for  xsx? 

2.  it  is  transitive:  x,y,zex,  x$y,  y<z  imply  x$z; 

3.  it  is  antisymmetric:  x,yeX,  x<y,  y<x  imply  x  =  y; 

4.  every  two  elements  are  comparable:  x,yeX  implies  either  x<y 

or  y<x. 

Algorithms  for  isotonic  regression:  Barlow  et  al  (1972)  have  discussed 

in  details  an  algorithm  called  the  Pool -Adjacent-Violators  algorithm  for 

finding  the  isotonic  regression  g* .  They  have  also  discussed  a  scheme  in 

order  to  program  this  algorithm  for  a  computer  (see  page  72  of  Barlow  et 

al) .  Using  essentially  this  scheme  Kruskal  (1964)  has  written  a  program 

to  carry  it  out  as  a  part  of  a  large  program. 

Theorem  4:  (Theorem  1.10  of  Barlow  et  al) .  For  9  convex,  let 

A#(g(x),f(x))  =  A(g,f)  -  9(g)  -  9(f)  -  (g  -  f)*(f) 

where  9  is  the  derivative  of  9  at  f ,  or  if  it  does  not  have  a  derivative 

at  f,  9(f)  denotes  any  number  between  the  left  and  the  right  derivative 

at  f.  If  f  is  isotonic  on  X  and  if  the  range  of  f  is  in  I  then 

I  A[g(x) ,f (x) 3w(x)>Z  A[g(x) ,g*(x) 3w(x)  +  E  A[g*(x) ,f (x) ]w(x) . 
x  “X  x 

Consequently  g*  minimizes  E  &[g(x) ,f (x) ]w(x)  in  the  class  of  isotonic  f 

x 

with  range  in  Z  and  maximizes  E  <9[f(x)]  +  Cg(x)  -  f (x) ]9[f (x)])w(x) . 

x 
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The  minimizing  (maximizing)  function  is  unique  if  *  is  strictly  convex. 
A  corollary  which  is  useful  for  our  purpose  is 

Corollary;  Let  be  arbitrary  real  valued  functions  and  let  , 

. ..,h  be  isotonic  functions  on  X.  Then  g*  minimizes  E  A[g(x)  ,f (x) ]w(x) 
m  x 

in  the  class  of  isotonic  functions  f  with  range  in  I  satisfying  any  or 
all  of  the  side  conditions 

E  Cg(x)  -  f(x)]«j[f(x)]w(x)  =  0,  j  -  l,...,p 
I  f (x)h^ (x)w(x)  >  E  g(x)h^ (x)w(x) ,  j  »  l,...,m  . 
Proof:  See  corollary  in  page  42  of  Barlow  et  al. 
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